Load the student scores for the test - here we load the ETH Zurich test data, downloaded from https://pontifex.ethz.ch/s21t5/rate/
skim(test_scores)
| Name | test_scores |
| Number of rows | 9671 |
| Number of columns | 44 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 41 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| test_version | 0 | 1 | 3 | 4 | 0 | 2 | 0 |
| year | 0 | 1 | 4 | 4 | 0 | 5 | 0 |
| class | 0 | 1 | 13 | 13 | 0 | 5 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| A1_B1 | 0 | 1.00 | 0.97 | 0.21 | 0 | 1 | 1 | 1 | 2 | ▁▁▇▁▁ |
| A2_B0 | 6238 | 0.35 | 0.43 | 0.62 | 0 | 0 | 0 | 1 | 2 | ▇▁▃▁▁ |
| A3_B2 | 0 | 1.00 | 0.70 | 0.51 | 0 | 0 | 1 | 1 | 2 | ▃▁▇▁▁ |
| A4_B3 | 0 | 1.00 | 0.66 | 0.51 | 0 | 0 | 1 | 1 | 2 | ▅▁▇▁▁ |
| A5_B4 | 0 | 1.00 | 1.03 | 0.72 | 0 | 1 | 1 | 2 | 2 | ▃▁▇▁▅ |
| A6_B5 | 0 | 1.00 | 1.06 | 0.51 | 0 | 1 | 1 | 1 | 2 | ▁▁▇▁▂ |
| A7_B6 | 0 | 1.00 | 0.72 | 0.47 | 0 | 0 | 1 | 1 | 2 | ▃▁▇▁▁ |
| A8_B7 | 0 | 1.00 | 0.80 | 0.65 | 0 | 0 | 1 | 1 | 2 | ▅▁▇▁▂ |
| A9_B8 | 0 | 1.00 | 1.08 | 0.74 | 0 | 1 | 1 | 2 | 2 | ▅▁▇▁▆ |
| A10_B9 | 0 | 1.00 | 1.01 | 0.68 | 0 | 1 | 1 | 1 | 2 | ▃▁▇▁▃ |
| A11_B10 | 0 | 1.00 | 0.93 | 0.64 | 0 | 1 | 1 | 1 | 2 | ▃▁▇▁▂ |
| A12_B0 | 6238 | 0.35 | 0.90 | 0.54 | 0 | 1 | 1 | 1 | 2 | ▂▁▇▁▁ |
| A13_B0 | 6238 | 0.35 | 0.60 | 0.69 | 0 | 0 | 0 | 1 | 2 | ▇▁▆▁▂ |
| A14_B12 | 0 | 1.00 | 0.81 | 0.64 | 0 | 0 | 1 | 1 | 2 | ▅▁▇▁▂ |
| A15_B13 | 0 | 1.00 | 0.74 | 0.69 | 0 | 0 | 1 | 1 | 2 | ▇▁▇▁▂ |
| A16_B14 | 0 | 1.00 | 0.61 | 0.80 | 0 | 0 | 0 | 1 | 2 | ▇▁▂▁▃ |
| A17_B15 | 0 | 1.00 | 0.73 | 0.58 | 0 | 0 | 1 | 1 | 2 | ▅▁▇▁▁ |
| A18_B16 | 0 | 1.00 | 0.78 | 0.80 | 0 | 0 | 1 | 1 | 2 | ▇▁▆▁▅ |
| A19_B0 | 6238 | 0.35 | 0.95 | 0.81 | 0 | 0 | 1 | 2 | 2 | ▇▁▇▁▇ |
| A20_B17 | 0 | 1.00 | 0.83 | 0.50 | 0 | 1 | 1 | 1 | 2 | ▂▁▇▁▁ |
| A21_B18 | 0 | 1.00 | 0.92 | 0.69 | 0 | 0 | 1 | 1 | 2 | ▅▁▇▁▃ |
| A22_B19 | 0 | 1.00 | 1.00 | 0.70 | 0 | 1 | 1 | 1 | 2 | ▃▁▇▁▃ |
| A23_B20 | 0 | 1.00 | 0.90 | 0.75 | 0 | 0 | 1 | 1 | 2 | ▆▁▇▁▅ |
| A24_B21 | 0 | 1.00 | 0.89 | 0.64 | 0 | 0 | 1 | 1 | 2 | ▃▁▇▁▂ |
| A25_B0 | 6238 | 0.35 | 0.72 | 0.53 | 0 | 0 | 1 | 1 | 2 | ▅▁▇▁▁ |
| A26_B22 | 0 | 1.00 | 1.07 | 0.44 | 0 | 1 | 1 | 1 | 2 | ▁▁▇▁▂ |
| A27_B23 | 0 | 1.00 | 1.02 | 0.77 | 0 | 0 | 1 | 2 | 2 | ▆▁▇▁▆ |
| A28_B24 | 0 | 1.00 | 1.08 | 0.75 | 0 | 1 | 1 | 2 | 2 | ▅▁▇▁▆ |
| A29_B25 | 0 | 1.00 | 0.88 | 0.70 | 0 | 0 | 1 | 1 | 2 | ▅▁▇▁▃ |
| A30_B0 | 6238 | 0.35 | 0.90 | 0.50 | 0 | 1 | 1 | 1 | 2 | ▂▁▇▁▁ |
| A31_B27 | 0 | 1.00 | 0.63 | 0.69 | 0 | 0 | 1 | 1 | 2 | ▇▁▆▁▂ |
| A32_B28 | 0 | 1.00 | 0.70 | 0.84 | 0 | 0 | 0 | 1 | 2 | ▇▁▃▁▃ |
| A33_B29 | 0 | 1.00 | 0.83 | 0.46 | 0 | 1 | 1 | 1 | 2 | ▂▁▇▁▁ |
| A34_B0 | 6238 | 0.35 | 1.14 | 0.62 | 0 | 1 | 1 | 2 | 2 | ▂▁▇▁▃ |
| A35_B0 | 6238 | 0.35 | 1.31 | 0.76 | 0 | 1 | 1 | 2 | 2 | ▃▁▅▁▇ |
| A36_B0 | 6238 | 0.35 | 1.14 | 0.87 | 0 | 0 | 1 | 2 | 2 | ▆▁▃▁▇ |
| A0_B11 | 3433 | 0.65 | 0.70 | 0.57 | 0 | 0 | 1 | 1 | 2 | ▅▁▇▁▁ |
| A0_B26 | 3433 | 0.65 | 0.80 | 0.65 | 0 | 0 | 1 | 1 | 2 | ▅▁▇▁▂ |
| A0_B30 | 3433 | 0.65 | 1.04 | 0.63 | 0 | 1 | 1 | 1 | 2 | ▂▁▇▁▃ |
| A0_B31 | 3433 | 0.65 | 0.96 | 0.79 | 0 | 0 | 1 | 2 | 2 | ▇▁▇▁▆ |
| A0_B32 | 3433 | 0.65 | 1.37 | 0.81 | 0 | 1 | 2 | 2 | 2 | ▃▁▃▁▇ |
The scores are:
For this analysis, we replace the “2 = I don’t know” scores with 0, reflecting a non-correct answer.
test_scores <- test_scores %>%
mutate(across(starts_with("A"), ~ ifelse(. == 2, 0, .)))
The number of responses from each cohort:
test_scores %>%
group_by(year) %>%
tally() %>%
gt() %>%
data_color(
columns = c("n"),
colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
)
| year | n |
|---|---|
| 2017 | 1682 |
| 2018 | 1751 |
| 2019 | 1999 |
| 2020 | 2191 |
| 2021 | 2048 |
There is the same (roughly normal) distribution of raw scores each year:
total_scores <- test_scores %>% mutate(total = rowSums(across(where(is.numeric)), na.rm = TRUE))
total_scores %>%
ggplot(aes(x = total)) +
geom_histogram(binwidth = 1) +
facet_wrap(vars(year)) +
theme_minimal()
Mean and standard deviation for each item (note this table is very wide - see the scroll bar at the bottom!):
test_scores %>%
select(-class, -test_version) %>%
group_by(year) %>%
skim_without_charts() %>%
select(-contains("character."), -contains("numeric.p"), -skim_type) %>%
rename(complete = complete_rate) %>%
# make the table wider, i.e. with separate columns for each year's results, with the year at the start of each column name
pivot_wider(names_from = year, values_from = -c(skim_variable, year), names_glue = "{year}__{.value}") %>%
# put the columns in order by year
select(sort(names(.))) %>%
select(skim_variable, everything()) %>%
# use GT to make the table look nice
gt(rowname_col = "skim_variable") %>%
# group the columns from each year
tab_spanner_delim(delim = "__") %>%
fmt_number(columns = contains("numeric"), decimals = 2) %>%
fmt_percent(columns = contains("complete"), decimals = 0) %>%
# change all the numeric.mean and numeric.sd column names to Mean and SD
cols_label(
.list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.mean"), label = "Mean") %>% deframe()
) %>%
cols_label(
.list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.sd"), label = "SD") %>% deframe()
) %>%
data_color(
columns = contains("numeric.mean"),
colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
)
| 2017 | 2018 | 2019 | 2020 | 2021 | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| complete | n_missing | Mean | SD | complete | n_missing | Mean | SD | complete | n_missing | Mean | SD | complete | n_missing | Mean | SD | complete | n_missing | Mean | SD | |
| A1_B1 | 100% | 0 | 0.95 | 0.22 | 100% | 0 | 0.95 | 0.21 | 100% | 0 | 0.96 | 0.21 | 100% | 0 | 0.96 | 0.20 | 100% | 0 | 0.95 | 0.21 |
| A2_B0 | 100% | 0 | 0.31 | 0.46 | 100% | 0 | 0.28 | 0.45 | 0% | 1999 | NaN | NA | 0% | 2191 | NaN | NA | 0% | 2048 | NaN | NA |
| A3_B2 | 100% | 0 | 0.62 | 0.48 | 100% | 0 | 0.64 | 0.48 | 100% | 0 | 0.67 | 0.47 | 100% | 0 | 0.66 | 0.48 | 100% | 0 | 0.67 | 0.47 |
| A4_B3 | 100% | 0 | 0.64 | 0.48 | 100% | 0 | 0.62 | 0.48 | 100% | 0 | 0.64 | 0.48 | 100% | 0 | 0.62 | 0.48 | 100% | 0 | 0.60 | 0.49 |
| A5_B4 | 100% | 0 | 0.47 | 0.50 | 100% | 0 | 0.49 | 0.50 | 100% | 0 | 0.49 | 0.50 | 100% | 0 | 0.49 | 0.50 | 100% | 0 | 0.49 | 0.50 |
| A6_B5 | 100% | 0 | 0.71 | 0.46 | 100% | 0 | 0.73 | 0.44 | 100% | 0 | 0.74 | 0.44 | 100% | 0 | 0.76 | 0.42 | 100% | 0 | 0.74 | 0.44 |
| A7_B6 | 100% | 0 | 0.71 | 0.46 | 100% | 0 | 0.70 | 0.46 | 100% | 0 | 0.69 | 0.46 | 100% | 0 | 0.70 | 0.46 | 100% | 0 | 0.68 | 0.47 |
| A8_B7 | 100% | 0 | 0.47 | 0.50 | 100% | 0 | 0.50 | 0.50 | 100% | 0 | 0.52 | 0.50 | 100% | 0 | 0.60 | 0.49 | 100% | 0 | 0.58 | 0.49 |
| A9_B8 | 100% | 0 | 0.46 | 0.50 | 100% | 0 | 0.46 | 0.50 | 100% | 0 | 0.46 | 0.50 | 100% | 0 | 0.43 | 0.50 | 100% | 0 | 0.42 | 0.49 |
| A10_B9 | 100% | 0 | 0.54 | 0.50 | 100% | 0 | 0.54 | 0.50 | 100% | 0 | 0.55 | 0.50 | 100% | 0 | 0.55 | 0.50 | 100% | 0 | 0.53 | 0.50 |
| A11_B10 | 100% | 0 | 0.57 | 0.49 | 100% | 0 | 0.58 | 0.49 | 100% | 0 | 0.58 | 0.49 | 100% | 0 | 0.59 | 0.49 | 100% | 0 | 0.57 | 0.50 |
| A12_B0 | 100% | 0 | 0.68 | 0.47 | 100% | 0 | 0.71 | 0.45 | 0% | 1999 | NaN | NA | 0% | 2191 | NaN | NA | 0% | 2048 | NaN | NA |
| A13_B0 | 100% | 0 | 0.37 | 0.48 | 100% | 0 | 0.36 | 0.48 | 0% | 1999 | NaN | NA | 0% | 2191 | NaN | NA | 0% | 2048 | NaN | NA |
| A14_B12 | 100% | 0 | 0.56 | 0.50 | 100% | 0 | 0.57 | 0.49 | 100% | 0 | 0.59 | 0.49 | 100% | 0 | 0.54 | 0.50 | 100% | 0 | 0.54 | 0.50 |
| A15_B13 | 100% | 0 | 0.46 | 0.50 | 100% | 0 | 0.46 | 0.50 | 100% | 0 | 0.47 | 0.50 | 100% | 0 | 0.46 | 0.50 | 100% | 0 | 0.45 | 0.50 |
| A16_B14 | 100% | 0 | 0.19 | 0.39 | 100% | 0 | 0.20 | 0.40 | 100% | 0 | 0.20 | 0.40 | 100% | 0 | 0.20 | 0.40 | 100% | 0 | 0.20 | 0.40 |
| A17_B15 | 100% | 0 | 0.59 | 0.49 | 100% | 0 | 0.60 | 0.49 | 100% | 0 | 0.60 | 0.49 | 100% | 0 | 0.59 | 0.49 | 100% | 0 | 0.57 | 0.49 |
| A18_B16 | 100% | 0 | 0.38 | 0.49 | 100% | 0 | 0.36 | 0.48 | 100% | 0 | 0.28 | 0.45 | 100% | 0 | 0.27 | 0.45 | 100% | 0 | 0.28 | 0.45 |
| A19_B0 | 100% | 0 | 0.35 | 0.48 | 100% | 0 | 0.34 | 0.47 | 0% | 1999 | NaN | NA | 0% | 2191 | NaN | NA | 0% | 2048 | NaN | NA |
| A20_B17 | 100% | 0 | 0.72 | 0.45 | 100% | 0 | 0.72 | 0.45 | 100% | 0 | 0.74 | 0.44 | 100% | 0 | 0.71 | 0.46 | 100% | 0 | 0.71 | 0.45 |
| A21_B18 | 100% | 0 | 0.51 | 0.50 | 100% | 0 | 0.52 | 0.50 | 100% | 0 | 0.54 | 0.50 | 100% | 0 | 0.52 | 0.50 | 100% | 0 | 0.51 | 0.50 |
| A22_B19 | 100% | 0 | 0.52 | 0.50 | 100% | 0 | 0.52 | 0.50 | 100% | 0 | 0.52 | 0.50 | 100% | 0 | 0.52 | 0.50 | 100% | 0 | 0.50 | 0.50 |
| A23_B20 | 100% | 0 | 0.41 | 0.49 | 100% | 0 | 0.43 | 0.50 | 100% | 0 | 0.42 | 0.49 | 100% | 0 | 0.42 | 0.49 | 100% | 0 | 0.42 | 0.49 |
| A24_B21 | 100% | 0 | 0.59 | 0.49 | 100% | 0 | 0.58 | 0.49 | 100% | 0 | 0.60 | 0.49 | 100% | 0 | 0.56 | 0.50 | 100% | 0 | 0.57 | 0.50 |
| A25_B0 | 100% | 0 | 0.54 | 0.50 | 100% | 0 | 0.73 | 0.44 | 0% | 1999 | NaN | NA | 0% | 2191 | NaN | NA | 0% | 2048 | NaN | NA |
| A26_B22 | 100% | 0 | 0.80 | 0.40 | 100% | 0 | 0.80 | 0.40 | 100% | 0 | 0.80 | 0.40 | 100% | 0 | 0.81 | 0.39 | 100% | 0 | 0.80 | 0.40 |
| A27_B23 | 100% | 0 | 0.39 | 0.49 | 100% | 0 | 0.40 | 0.49 | 100% | 0 | 0.42 | 0.49 | 100% | 0 | 0.40 | 0.49 | 100% | 0 | 0.41 | 0.49 |
| A28_B24 | 100% | 0 | 0.42 | 0.49 | 100% | 0 | 0.46 | 0.50 | 100% | 0 | 0.45 | 0.50 | 100% | 0 | 0.42 | 0.49 | 100% | 0 | 0.41 | 0.49 |
| A29_B25 | 100% | 0 | 0.48 | 0.50 | 100% | 0 | 0.52 | 0.50 | 100% | 0 | 0.51 | 0.50 | 100% | 0 | 0.49 | 0.50 | 100% | 0 | 0.50 | 0.50 |
| A30_B0 | 100% | 0 | 0.76 | 0.43 | 100% | 0 | 0.73 | 0.44 | 0% | 1999 | NaN | NA | 0% | 2191 | NaN | NA | 0% | 2048 | NaN | NA |
| A31_B27 | 100% | 0 | 0.38 | 0.48 | 100% | 0 | 0.37 | 0.48 | 100% | 0 | 0.39 | 0.49 | 100% | 0 | 0.40 | 0.49 | 100% | 0 | 0.40 | 0.49 |
| A32_B28 | 100% | 0 | 0.23 | 0.42 | 100% | 0 | 0.20 | 0.40 | 100% | 0 | 0.20 | 0.40 | 100% | 0 | 0.21 | 0.41 | 100% | 0 | 0.20 | 0.40 |
| A33_B29 | 100% | 0 | 0.78 | 0.42 | 100% | 0 | 0.76 | 0.43 | 100% | 0 | 0.77 | 0.42 | 100% | 0 | 0.74 | 0.44 | 100% | 0 | 0.75 | 0.43 |
| A34_B0 | 100% | 0 | 0.58 | 0.49 | 100% | 0 | 0.61 | 0.49 | 0% | 1999 | NaN | NA | 0% | 2191 | NaN | NA | 0% | 2048 | NaN | NA |
| A35_B0 | 100% | 0 | 0.32 | 0.47 | 100% | 0 | 0.33 | 0.47 | 0% | 1999 | NaN | NA | 0% | 2191 | NaN | NA | 0% | 2048 | NaN | NA |
| A36_B0 | 100% | 0 | 0.22 | 0.41 | 100% | 0 | 0.23 | 0.42 | 0% | 1999 | NaN | NA | 0% | 2191 | NaN | NA | 0% | 2048 | NaN | NA |
| A0_B11 | 0% | 1682 | NaN | NA | 0% | 1751 | NaN | NA | 100% | 0 | 0.57 | 0.50 | 100% | 0 | 0.59 | 0.49 | 100% | 0 | 0.58 | 0.49 |
| A0_B26 | 0% | 1682 | NaN | NA | 0% | 1751 | NaN | NA | 100% | 0 | 0.54 | 0.50 | 100% | 0 | 0.53 | 0.50 | 100% | 0 | 0.52 | 0.50 |
| A0_B30 | 0% | 1682 | NaN | NA | 0% | 1751 | NaN | NA | 100% | 0 | 0.61 | 0.49 | 100% | 0 | 0.61 | 0.49 | 100% | 0 | 0.60 | 0.49 |
| A0_B31 | 0% | 1682 | NaN | NA | 0% | 1751 | NaN | NA | 100% | 0 | 0.35 | 0.48 | 100% | 0 | 0.33 | 0.47 | 100% | 0 | 0.45 | 0.50 |
| A0_B32 | 0% | 1682 | NaN | NA | 0% | 1751 | NaN | NA | 100% | 0 | 0.20 | 0.40 | 100% | 0 | 0.21 | 0.41 | 100% | 0 | 0.21 | 0.41 |
Before applying IRT, we should check that the data satisfies the assumptions needed by the model. In particular, to use a 1-dimensional IRT model, we should have some evidence of unidimensionality in the test scores.
Since the combined data on the two versions of the test have large portions of “missing” data (e.g. no responses to new questions from students who completed the old test), it is not possible to carry out the factor analysis as in the analyse-test script, since the factor analysis routine does not work with missing data.
Instead, in the next section we proceed directly to fitting the IRT model, and using the \(Q_3\) check for local independence. In the final section, we also run a factor analysis for the data from the new test only.
We use the mirt package to fit an item response theory
model.
For this analysis, we use the 2 parameter logistic (2PL) model.
# save just the matrix of scores
item_scores <- test_scores %>%
select(matches("^A\\d"))
fit_2pl <- mirt(
data = item_scores, # just the columns with question scores
#removeEmptyRows = TRUE,
model = 1, # number of factors to extract
itemtype = "2PL", # 2 parameter logistic model
SE = TRUE # estimate standard errors
)
##
Iteration: 1, Log-Lik: -186638.319, Max-Change: 4.61593
Iteration: 2, Log-Lik: -173222.852, Max-Change: 3.34257
Iteration: 3, Log-Lik: -172025.296, Max-Change: 0.75461
Iteration: 4, Log-Lik: -171559.893, Max-Change: 0.30299
Iteration: 5, Log-Lik: -171310.677, Max-Change: 0.29565
Iteration: 6, Log-Lik: -171163.155, Max-Change: 0.15660
Iteration: 7, Log-Lik: -171055.541, Max-Change: 0.15756
Iteration: 8, Log-Lik: -170980.492, Max-Change: 0.10202
Iteration: 9, Log-Lik: -170924.986, Max-Change: 0.13257
Iteration: 10, Log-Lik: -170881.349, Max-Change: 0.08812
Iteration: 11, Log-Lik: -170846.140, Max-Change: 0.07497
Iteration: 12, Log-Lik: -170823.782, Max-Change: 0.04927
Iteration: 13, Log-Lik: -170807.153, Max-Change: 0.06731
Iteration: 14, Log-Lik: -170793.659, Max-Change: 0.03988
Iteration: 15, Log-Lik: -170783.377, Max-Change: 0.04419
Iteration: 16, Log-Lik: -170775.383, Max-Change: 0.02498
Iteration: 17, Log-Lik: -170768.350, Max-Change: 0.03335
Iteration: 18, Log-Lik: -170762.835, Max-Change: 0.01562
Iteration: 19, Log-Lik: -170757.759, Max-Change: 0.02149
Iteration: 20, Log-Lik: -170753.836, Max-Change: 0.01406
Iteration: 21, Log-Lik: -170750.152, Max-Change: 0.01118
Iteration: 22, Log-Lik: -170744.133, Max-Change: 0.01296
Iteration: 23, Log-Lik: -170741.889, Max-Change: 0.01003
Iteration: 24, Log-Lik: -170740.015, Max-Change: 0.00919
Iteration: 25, Log-Lik: -170732.968, Max-Change: 0.00538
Iteration: 26, Log-Lik: -170732.507, Max-Change: 0.00457
Iteration: 27, Log-Lik: -170732.168, Max-Change: 0.00410
Iteration: 28, Log-Lik: -170730.815, Max-Change: 0.00252
Iteration: 29, Log-Lik: -170730.746, Max-Change: 0.00278
Iteration: 30, Log-Lik: -170730.686, Max-Change: 0.00187
Iteration: 31, Log-Lik: -170730.559, Max-Change: 0.00125
Iteration: 32, Log-Lik: -170730.538, Max-Change: 0.00104
Iteration: 33, Log-Lik: -170730.519, Max-Change: 0.00099
Iteration: 34, Log-Lik: -170730.435, Max-Change: 0.00068
Iteration: 35, Log-Lik: -170730.427, Max-Change: 0.00035
Iteration: 36, Log-Lik: -170730.424, Max-Change: 0.00018
Iteration: 37, Log-Lik: -170730.419, Max-Change: 0.00019
Iteration: 38, Log-Lik: -170730.417, Max-Change: 0.00018
Iteration: 39, Log-Lik: -170730.416, Max-Change: 0.00016
Iteration: 40, Log-Lik: -170730.408, Max-Change: 0.00013
Iteration: 41, Log-Lik: -170730.408, Max-Change: 0.00013
Iteration: 42, Log-Lik: -170730.407, Max-Change: 0.00013
Iteration: 43, Log-Lik: -170730.403, Max-Change: 0.00056
Iteration: 44, Log-Lik: -170730.401, Max-Change: 0.00046
Iteration: 45, Log-Lik: -170730.399, Max-Change: 0.00046
Iteration: 46, Log-Lik: -170730.398, Max-Change: 0.00029
Iteration: 47, Log-Lik: -170730.397, Max-Change: 0.00011
Iteration: 48, Log-Lik: -170730.397, Max-Change: 0.00005
##
## Calculating information matrix...
We compute Yen’s \(Q_3\) (1984) to check for any dependence between items after controlling for \(\theta\). This gives a score for each pair of items, with scores above 0.2 regarded as problematic (see DeMars, p. 48).
residuals_2pl %>% as.matrix() %>%
corrplot::corrplot(type = "upper")
This shows that most item pairs are independent, with only a couple of pairs showing cause for concern:
residuals_2pl %>%
rownames_to_column(var = "item1") %>%
as_tibble() %>%
pivot_longer(cols = starts_with("A"), names_to = "item2", values_to = "Q3_score") %>%
filter(abs(Q3_score) > 0.2) %>%
filter(parse_number(item1) < parse_number(item2)) %>%
gt()
| item1 | item2 | Q3_score |
|---|---|---|
| A18_B16 | A19_B0 | 0.6739582 |
| A34_B0 | A35_B0 | 0.2105944 |
In fact, both of these pairs highlight questions that were removed from the test:
A18 and A19 are on the product and quotient rules, and only A18 was retained on the new test,
A34 and A35 are both about vectors; only A34 was retained (in modified form, as B30)
Given that this violation of the local independence assumption is very mild, we proceed using this model.
We then compute factor score estimates and augment the existing data
frame with these estimates, to keep everything in one place. To do the
estimation, we use the fscores() function from the mirt
package which takes in a computed model object and computes factor score
estimates according to the method specified. We will use the EAP method
for factor score estimation, which is the “expected a-posteriori”
method, the default.
test_scores <- test_scores %>%
mutate(F1 = fscores(fit_2pl, method = "EAP"))
We can also calculate the model coefficient estimates using a generic
function coef() which is used to extract model coefficients
from objects returned by modeling functions. We will set the
IRTpars argument to TRUE, which means slope
intercept parameters will be converted into traditional IRT
parameters.
coefs_2pl <- coef(fit_2pl, IRTpars = TRUE)
The resulting object coefs is a list, with one element
for each question, and an additional GroupPars element that
we won’t be using. For each question, the object records several
values:
a is discriminationb is difficultyTo make this output a little more user friendly, we use the
tidy_mirt_coefs function that we have provided in
common-functions.R, to produce a single data frame with a
row for each question.
source("common-functions.R")
# use head(., -1) to remove the last element, `GroupPars`, which does not correspond to a question
tidy_2pl <- map_dfr(head(coefs_2pl, -1), tidy_mirt_coefs, .id = "Question") %>%
left_join(test_versions, by = c("Question" = "label"))
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
Here is a nicely formatted table of the result:
tidy_2pl %>%
select(-pre,-post,-notes) %>%
group_by(outcome) %>%
gt() %>%
fmt_number(columns = contains("a_"), decimals = 2) %>%
fmt_number(columns = contains("b_"), decimals = 2) %>%
data_color(
columns = contains("a_"),
colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
) %>%
data_color(
columns = contains("b_"),
colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
) %>%
tab_spanner(label = "Discrimination", columns = contains("a_")) %>%
tab_spanner(label = "Difficulty", columns = contains("b_")) %>%
cols_label(
a_est = "Est.",
b_est = "Est.",
a_CI_2.5 = "2.5%",
b_CI_2.5 = "2.5%",
a_CI_97.5 = "97.5%",
b_CI_97.5 = "97.5%"
)
| Question | Discrimination | Difficulty | description | item_num | ||||
|---|---|---|---|---|---|---|---|---|
| Est. | 2.5% | 97.5% | Est. | 2.5% | 97.5% | |||
| unchanged | ||||||||
| A1_B1 | 1.26 | 1.14 | 1.38 | −2.92 | −3.13 | −2.71 | linear function | 1 |
| A3_B2 | 1.29 | 1.23 | 1.36 | −0.64 | −0.69 | −0.60 | arithmetic rules | 3 |
| A4_B3 | 1.15 | 1.09 | 1.21 | −0.57 | −0.62 | −0.52 | Easy ineq. | 4 |
| A5_B4 | 1.43 | 1.35 | 1.50 | 0.05 | 0.01 | 0.09 | logs | 5 |
| A6_B5 | 1.51 | 1.43 | 1.60 | −0.97 | −1.02 | −0.92 | logs | 6 |
| A7_B6 | 1.37 | 1.30 | 1.44 | −0.82 | −0.87 | −0.77 | graph translation | 7 |
| A8_B7 | 1.18 | 1.11 | 1.24 | −0.18 | −0.22 | −0.13 | sine pi/3 | 8 |
| A9_B8 | 1.98 | 1.89 | 2.08 | 0.17 | 0.14 | 0.20 | trig.ineq. | 9 |
| A10_B9 | 1.74 | 1.66 | 1.83 | −0.16 | −0.19 | −0.12 | trig.identity | 10 |
| A11_B10 | 1.61 | 1.53 | 1.69 | −0.30 | −0.34 | −0.26 | period | 11 |
| A14_B12 | 1.47 | 1.40 | 1.55 | −0.23 | −0.27 | −0.19 | limit | 14 |
| A15_B13 | 0.83 | 0.78 | 0.88 | 0.21 | 0.16 | 0.27 | series | 15 |
| A16_B14 | 1.41 | 1.32 | 1.49 | 1.34 | 1.28 | 1.41 | diff.quotient | 16 |
| A17_B15 | 1.22 | 1.15 | 1.28 | −0.39 | −0.43 | −0.34 | graph f' | 17 |
| A18_B16 | 1.08 | 1.02 | 1.15 | 0.91 | 0.85 | 0.97 | product rule | 18 |
| A20_B17 | 2.18 | 2.06 | 2.29 | −0.75 | −0.78 | −0.71 | (exp)' | 20 |
| A21_B18 | 2.10 | 2.00 | 2.20 | −0.07 | −0.10 | −0.03 | (ln(sin))' | 21 |
| A22_B19 | 1.99 | 1.90 | 2.09 | −0.05 | −0.08 | −0.01 | hyp.functions | 22 |
| A23_B20 | 1.94 | 1.85 | 2.03 | 0.26 | 0.23 | 0.30 | slope tangent | 23 |
| A24_B21 | 1.41 | 1.34 | 1.49 | −0.31 | −0.35 | −0.27 | IVT | 24 |
| A26_B22 | 2.38 | 2.25 | 2.51 | −1.06 | −1.10 | −1.02 | int(poly) | 26 |
| A27_B23 | 1.89 | 1.80 | 1.99 | 0.32 | 0.29 | 0.36 | int(exp) | 27 |
| A28_B24 | 1.85 | 1.76 | 1.94 | 0.23 | 0.20 | 0.27 | Int = 0 | 28 |
| A29_B25 | 0.83 | 0.78 | 0.88 | 0.00 | −0.05 | 0.06 | int even funct | 29 |
| A31_B27 | 1.29 | 1.23 | 1.36 | 0.45 | 0.40 | 0.49 | int(abs) | 30 |
| A32_B28 | 1.52 | 1.44 | 1.61 | 1.22 | 1.16 | 1.28 | FtoC algebra | 31 |
| A33_B29 | 0.99 | 0.93 | 1.06 | −1.37 | −1.46 | −1.29 | difference vectors | 32 |
| removed | ||||||||
| A2_B0 | 0.55 | 0.47 | 0.64 | 1.69 | 1.43 | 1.96 | 3D | 2 |
| A12_B0 | 0.88 | 0.78 | 0.97 | −1.11 | −1.24 | −0.98 | rational exponents | 12 |
| A13_B0 | 0.71 | 0.62 | 0.79 | 0.86 | 0.73 | 1.00 | ellipsoid | 13 |
| A19_B0 | 1.27 | 1.16 | 1.38 | 0.65 | 0.58 | 0.73 | quotient rule | 19 |
| A25_B0 | 0.53 | 0.45 | 0.61 | −1.14 | −1.35 | −0.94 | velocity | 25 |
| A30_B0 | 1.25 | 1.13 | 1.37 | −1.10 | −1.20 | −1.00 | FtoC graph | 35 |
| A34_B0 | 0.89 | 0.80 | 0.99 | −0.52 | −0.62 | −0.43 | normal vector | 36 |
| A35_B0 | 1.26 | 1.15 | 1.37 | 0.74 | 0.66 | 0.82 | intersect planes | 33 |
| A36_B0 | 1.23 | 1.11 | 1.35 | 1.30 | 1.19 | 1.41 | parallel planes | 34 |
| added | ||||||||
| A0_B11 | 0.77 | 0.70 | 0.84 | −0.48 | −0.56 | −0.40 | rational exponents | 37 |
| A0_B26 | 1.08 | 1.01 | 1.16 | −0.15 | −0.20 | −0.09 | FtoC graph | 40 |
| A0_B30 | 0.81 | 0.75 | 0.88 | −0.59 | −0.67 | −0.51 | normal vector | 41 |
| A0_B31 | 1.01 | 0.94 | 1.09 | 0.62 | 0.55 | 0.69 | vector product | 38 |
| A0_B32 | 1.32 | 1.22 | 1.41 | 1.32 | 1.24 | 1.41 | scalar product | 39 |
These values are also saved to the file
output/eth_post_2pl-results.csv.
tidy_2pl %>%
write_csv("output/eth_post_2pl-results.csv")
This shows the difficulty and discrimination of each of the questions, highlighting those that were added or removed:
tidy_2pl %>%
mutate(qnum = parse_number(Question)) %>%
ggplot(aes(
x = a_est,
y = b_est,
label = case_when(
outcome == "unchanged" ~ "",
outcome == "removed" ~ pre,
outcome == "added" ~ post
),
colour = outcome
)) +
geom_errorbar(aes(ymin = b_CI_2.5, ymax = b_CI_97.5), width = 0, alpha = 0.5) +
geom_errorbar(aes(xmin = a_CI_2.5, xmax = a_CI_97.5), width = 0, alpha = 0.5) +
geom_text_repel() +
geom_point() +
theme_minimal() +
labs(x = "Discrimination",
y = "Difficulty")
Do students from different programmes of study have different distributions of ability?
Compare the distribution of abilities in the year groups:
ggplot(test_scores, aes(F1, y = year, fill = as.factor(year), colour = as.factor(year))) +
geom_density_ridges(alpha=0.5) +
scale_x_continuous(limits = c(-3.5,3.5)) +
labs(title = "Density plot",
subtitle = "Ability grouped by year of taking the test",
x = "Ability", y = "Density",
fill = "Year", colour = "Year") +
theme_minimal()
## Picking joint bandwidth of 0.189
theta <- seq(-6, 6, by=0.05)
info_matrix <- testinfo(fit_2pl, theta, individual = TRUE)
colnames(info_matrix) <- test_versions %>% pull(label)
item_info_data <- info_matrix %>%
as_tibble() %>%
bind_cols(theta = theta) %>%
pivot_longer(cols = -theta, names_to = "item", values_to = "info_y") %>%
mutate(item = fct_reorder(item, parse_number(item)))
plot(fit_2pl, type = "infoSE", main = "Test information")
The information curves for each question help to highlight those questions that are most/least informative:
item_info_data %>%
ggplot(aes(x = theta, y = info_y, colour = item)) +
geom_line() +
scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
facet_wrap(vars(item)) +
labs(y = "Information") +
theme_minimal()
Using mirt’s areainfo function, we can find
the total area under the information curves.
info_irt <- areainfo(fit_2pl, c(-4,4))
info_irt %>% gt()
| LowerBound | UpperBound | Info | TotalInfo | Proportion | nitems |
|---|---|---|---|---|---|
| -4 | 4 | 52.88075 | 54.46439 | 0.9709234 | 41 |
This shows that the total information in all items is 54.4643857.
tidy_info <- test_versions %>%
mutate(item_num = row_number()) %>%
mutate(TotalInfo = purrr::map_dbl(
item_num,
~ areainfo(fit_2pl,
c(-4, 4),
which.items = .x) %>% pull(TotalInfo)
))
tidy_info %>%
select(-item_num) %>%
arrange(-TotalInfo) %>%
#group_by(outcome) %>%
gt() %>%
sub_missing(columns = one_of("notes", "pre", "post"), missing_text = "") %>%
fmt_number(columns = contains("a_"), decimals = 2) %>%
fmt_number(columns = contains("b_"), decimals = 2) %>%
data_color(
columns = contains("info"),
colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
) %>%
data_color(
columns = contains("outcome"),
colors = scales::col_factor(palette = c("viridis"), domain = NULL)
) %>%
cols_label(
TotalInfo = "Information"
)
| pre | post | description | notes | label | outcome | Information |
|---|---|---|---|---|---|---|
| A26 | B22 | int(poly) | A26_B22 | unchanged | 2.3802120 | |
| A20 | B17 | (exp)' | A20_B17 | unchanged | 2.1756893 | |
| A21 | B18 | (ln(sin))' | A21_B18 | unchanged | 2.0980069 | |
| A22 | B19 | hyp.functions | A22_B19 | unchanged | 1.9933525 | |
| A9 | B8 | trig.ineq. | A9_B8 | unchanged | 1.9831140 | |
| A23 | B20 | slope tangent | A23_B20 | unchanged | 1.9408823 | |
| A27 | B23 | int(exp) | A27_B23 | unchanged | 1.8936276 | |
| A28 | B24 | Int = 0 | A28_B24 | unchanged | 1.8527290 | |
| A10 | B9 | trig.identity | A10_B9 | unchanged | 1.7433688 | |
| A11 | B10 | period | A11_B10 | unchanged | 1.6077264 | |
| A33 | B29 | difference vectors | A33_B29 | unchanged | 1.5215297 | |
| A6 | B5 | logs | A6_B5 | unchanged | 1.5147947 | |
| A14 | B12 | limit | A14_B12 | unchanged | 1.4742547 | |
| A5 | B4 | logs | A5_B4 | unchanged | 1.4257166 | |
| A24 | B21 | IVT | A24_B21 | unchanged | 1.4143219 | |
| A16 | B14 | diff.quotient | A16_B14 | unchanged | 1.4056184 | |
| A7 | B6 | graph translation | A7_B6 | unchanged | 1.3700318 | |
| B30 | normal vector | updated version of A34 | A0_B30 | added | 1.3176397 | |
| A3 | B2 | arithmetic rules | A3_B2 | unchanged | 1.2944966 | |
| A32 | B28 | FtoC algebra | A32_B28 | unchanged | 1.2929143 | |
| A19 | quotient rule | A19_B0 | removed | 1.2709396 | ||
| A1 | B1 | linear function | A1_B1 | unchanged | 1.2607121 | |
| A30 | FtoC graph | adjusted to B26 | A30_B0 | removed | 1.2590765 | |
| A31 | B27 | int(abs) | A31_B27 | unchanged | 1.2511116 | |
| A34 | normal vector | adjusted to B30 | A34_B0 | removed | 1.2255439 | |
| A17 | B15 | graph f' | A17_B15 | unchanged | 1.2156068 | |
| A8 | B7 | sine pi/3 | A8_B7 | unchanged | 1.1759131 | |
| A4 | B3 | Easy ineq. | A4_B3 | unchanged | 1.1498403 | |
| B31 | vector product | A0_B31 | added | 1.0840039 | ||
| A18 | B16 | product rule | A18_B16 | unchanged | 1.0825203 | |
| B26 | FtoC graph | updated version of A30 | A0_B26 | added | 1.0127943 | |
| A35 | intersect planes | A35_B0 | removed | 0.9911898 | ||
| A36 | parallel planes | A36_B0 | removed | 0.8938061 | ||
| A12 | rational exponents | adjusted to B11 | A12_B0 | removed | 0.8758781 | |
| A29 | B25 | int even funct | A29_B25 | unchanged | 0.8301225 | |
| A15 | B13 | series | A15_B13 | unchanged | 0.8287906 | |
| B32 | scalar product | A0_B32 | added | 0.8117541 | ||
| B11 | rational exponents | updated version of A12 | A0_B11 | added | 0.7690007 | |
| A13 | ellipsoid | A13_B0 | removed | 0.7074127 | ||
| A2 | 3D | A2_B0 | removed | 0.5473019 | ||
| A25 | velocity | A25_B0 | removed | 0.5210397 |
Restricting instead to the range \(-2\leq\theta\leq2\):
tidy_info <- test_versions %>%
mutate(item_num = row_number()) %>%
mutate(TotalInfo = purrr::map_dbl(
item_num,
~ areainfo(fit_2pl,
c(-2, 2),
which.items = .x) %>% pull(Info)
))
tidy_info %>%
select(-item_num) %>%
arrange(-TotalInfo) %>%
#group_by(outcome) %>%
gt() %>%
sub_missing(columns = one_of("notes", "pre", "post"), missing_text = "") %>%
fmt_number(columns = contains("a_"), decimals = 2) %>%
fmt_number(columns = contains("b_"), decimals = 2) %>%
data_color(
columns = contains("info"),
colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
) %>%
data_color(
columns = contains("outcome"),
colors = scales::col_factor(palette = c("viridis"), domain = NULL)
) %>%
cols_label(
TotalInfo = "Information"
)
| pre | post | description | notes | label | outcome | Information |
|---|---|---|---|---|---|---|
| A26 | B22 | int(poly) | A26_B22 | unchanged | 2.1496515 | |
| A20 | B17 | (exp)' | A20_B17 | unchanged | 2.0368241 | |
| A21 | B18 | (ln(sin))' | A21_B18 | unchanged | 2.0351851 | |
| A22 | B19 | hyp.functions | A22_B19 | unchanged | 1.9204031 | |
| A9 | B8 | trig.ineq. | A9_B8 | unchanged | 1.9053008 | |
| A23 | B20 | slope tangent | A23_B20 | unchanged | 1.8528009 | |
| A27 | B23 | int(exp) | A27_B23 | unchanged | 1.7945879 | |
| A28 | B24 | Int = 0 | A28_B24 | unchanged | 1.7559052 | |
| A10 | B9 | trig.identity | A10_B9 | unchanged | 1.6363424 | |
| A11 | B10 | period | A11_B10 | unchanged | 1.4708151 | |
| A14 | B12 | limit | A14_B12 | unchanged | 1.3203486 | |
| A5 | B4 | logs | A5_B4 | unchanged | 1.2696657 | |
| A24 | B21 | IVT | A24_B21 | unchanged | 1.2437250 | |
| A6 | B5 | logs | A6_B5 | unchanged | 1.2360528 | |
| A33 | B29 | difference vectors | A33_B29 | unchanged | 1.1534582 | |
| A7 | B6 | graph translation | A7_B6 | unchanged | 1.1156355 | |
| A32 | B28 | FtoC algebra | A32_B28 | unchanged | 1.0872317 | |
| A3 | B2 | arithmetic rules | A3_B2 | unchanged | 1.0630819 | |
| A19 | quotient rule | A19_B0 | removed | 1.0346021 | ||
| A30 | FtoC graph | adjusted to B26 | A30_B0 | removed | 1.0059374 | |
| A17 | B15 | graph f' | A17_B15 | unchanged | 1.0022925 | |
| A16 | B14 | diff.quotient | A16_B14 | unchanged | 0.9940931 | |
| A8 | B7 | sine pi/3 | A8_B7 | unchanged | 0.9682115 | |
| A31 | B27 | int(abs) | A31_B27 | unchanged | 0.9193184 | |
| B30 | normal vector | updated version of A34 | A0_B30 | added | 0.9180818 | |
| A4 | B3 | Easy ineq. | A4_B3 | unchanged | 0.9070485 | |
| B31 | vector product | A0_B31 | added | 0.8594470 | ||
| A34 | normal vector | adjusted to B30 | A34_B0 | removed | 0.8404270 | |
| A18 | B16 | product rule | A18_B16 | unchanged | 0.7834848 | |
| B26 | FtoC graph | updated version of A30 | A0_B26 | added | 0.7456596 | |
| A36 | parallel planes | A36_B0 | removed | 0.6208678 | ||
| A35 | intersect planes | A35_B0 | removed | 0.6115641 | ||
| A29 | B25 | int even funct | A29_B25 | unchanged | 0.5653949 | |
| A15 | B13 | series | A15_B13 | unchanged | 0.5615278 | |
| A12 | rational exponents | adjusted to B11 | A12_B0 | removed | 0.5473130 | |
| B32 | scalar product | A0_B32 | added | 0.5278274 | ||
| B11 | rational exponents | updated version of A12 | A0_B11 | added | 0.4880861 | |
| A13 | ellipsoid | A13_B0 | removed | 0.4075948 | ||
| A1 | B1 | linear function | A1_B1 | unchanged | 0.2991127 | |
| A25 | velocity | A25_B0 | removed | 0.2378399 | ||
| A2 | 3D | A2_B0 | removed | 0.2369482 |
info_comparison_data <- item_info_data %>%
mutate(item = as.character(item)) %>%
left_join(test_versions %>% mutate(item = as.character(label)), by = "item") %>%
group_by(theta) %>%
summarise(
items_pre = sum(ifelse(outcome %in% c("unchanged", "removed"), info_y, 0)),
items_post = sum(ifelse(outcome %in% c("unchanged", "added"), info_y, 0)),
items_added = sum(ifelse(outcome %in% c("added"), info_y, 0)),
items_removed = sum(ifelse(outcome %in% c("removed"), info_y, 0)),
n_added = sum(ifelse(outcome %in% c("added"), 1, 0)),
items_added_mean = sum(ifelse(outcome %in% c("added"), info_y, 0)) / n_added,
items_removed_mean = sum(ifelse(outcome %in% c("removed"), info_y, 0)) / sum(ifelse(outcome %in% c("removed"), 1, 0))
) %>%
pivot_longer(cols = starts_with("items_"), names_to = "items", names_prefix = "items_", values_to = "info_y")
test_info_total_plot <- info_comparison_data %>%
filter(items %in% c("pre", "post")) %>%
mutate(items = case_when(items == "pre" ~ "Version A", items == "post" ~ "Version B")) %>%
#mutate(items = fct_relevel(items, "pre", "post")) %>%
ggplot(aes(x = theta, y = info_y, colour = items, linetype = items)) +
geom_line() +
scale_colour_brewer("ETH s21t", palette = "Set1") +
scale_linetype_manual("ETH s21t", values = c("dashed", "solid")) +
labs(x = "Ability", y = "Information") +
theme_minimal() +
theme(legend.position = "top",
legend.title = element_text(face = "bold"))
test_info_total_plot
ggsave("output/eth_pre-vs-post_info.pdf", units = "cm", width = 8, height = 8)
test_info_changes_plot <- info_comparison_data %>%
filter(!items %in% c("pre", "post")) %>%
mutate(panel = ifelse(str_detect(items, "_mean"), "Mean information\nper question", "Total information")) %>%
mutate(panel = fct_rev(panel)) %>%
mutate(items = str_remove(items, "_mean")) %>%
#mutate(items = fct_relevel(items, "pre", "post")) %>%
ggplot(aes(x = theta, y = info_y, colour = items, linetype = items)) +
geom_line() +
scale_colour_brewer("Questions", palette = "Paired", direction = -1) +
scale_linetype_manual("Questions", values = c("solid", "dashed")) +
facet_wrap(vars(panel), scales = "free_y") +
labs(x = "Ability", y = "Information") +
theme_minimal() +
theme(legend.position = "top")
test_info_changes_plot
ggsave("output/eth_pre-vs-post_changes.pdf", units = "cm", width = 16, height = 8)
test_info_total_plot + test_info_changes_plot + patchwork::plot_layout(widths = c(2, 3))
ggsave("output/eth_pre-vs-post_info-summary.pdf", units = "cm", width = 16, height = 8)
This shows that the questions that added questions offer, on average, slightly more information than the questions they replaced (however there are fewer of them, hence lower total information as shown above). The difference is largest in the ability range 0-2.
trace_data <- probtrace(fit_2pl, theta) %>%
as_tibble() %>%
bind_cols(theta = theta) %>%
pivot_longer(cols = -theta, names_to = "level", values_to = "y") %>%
separate(level, into = c("item", NA, "score"), sep = "\\.")
trace_data %>%
mutate(item = fct_reorder(item, parse_number(item))) %>%
ggplot(aes(x = theta, y = y, colour = score)) +
geom_line() +
scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
facet_wrap(vars(item)) +
labs(y = "Probability of response") +
theme_minimal()
This plot combines all the trace lines onto a single set of axes:
plt <- trace_data %>%
filter(score == 1) %>%
left_join(test_versions %>% mutate(item = as.factor(label)), by = "item") %>%
mutate(item_removed = (outcome == "removed")) %>%
ggplot(aes(x = theta, y = y, colour = item, text = item)) +
geom_line(aes(linetype = outcome)) +
scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
scale_linetype_manual("outcome", values = c("unchanged" = "solid", "removed" = "dashed", "added" = "twodash")) +
labs(y = "Expected score") +
theme_minimal()
ggplotly(plt, tooltip = "text")
ggsave(plot = plt, file = "output/eth_pre-vs-post_iccs-superimposed.pdf", width = 20, height = 14, units = "cm")
That is quite a busy plot, so here we focus only on the changes:
plt <- trace_data %>%
filter(score == 1) %>%
left_join(test_versions %>% mutate(item = as.factor(label)), by = "item") %>%
filter(outcome != "unchanged") %>%
mutate(item_removed = (outcome == "removed")) %>%
ggplot(aes(x = theta, y = y, colour = item, text = item)) +
geom_line(aes(linetype = outcome)) +
scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
scale_linetype_manual("outcome", values = c("removed" = "dashed", "added" = "solid")) +
labs(y = "Expected score") +
theme_minimal()
ggplotly(plt, tooltip = "text")
ggsave(plot = plt, file = "output/eth_pre-vs-post_iccs-superimposed-highlight.pdf", width = 20, height = 14, units = "cm")
Here we redo the factor analysis, but using only the data from the new version of the test.
item_scores_B <- test_scores %>%
select(-F1) %>%
select(-contains("B0")) %>%
filter(test_version == "post") %>%
select(-class, -year, -test_version)
The parameters package provides functions that run
various checks to see if the data is suitable for factor analysis, and
if so, how many factors should be retained.
structure <- check_factorstructure(item_scores_B)
n <- n_factors(item_scores_B)
The choice of 4 dimensions is supported by 5 (26.32%) methods out of 19 (beta, Optimal coordinates, Parallel analysis, Kaiser criterion, Scree (SE)).
plot(n)
summary(n) %>% gt()
| n_Factors | n_Methods |
|---|---|
| 1 | 4 |
| 2 | 1 |
| 3 | 2 |
| 4 | 5 |
| 5 | 1 |
| 6 | 1 |
| 12 | 2 |
| 29 | 3 |
#n %>% tibble() %>% gt()
The scree plot shows the eignvalues associated with each factor:
fa.parallel(item_scores_B, fa = "fa")
## Parallel analysis suggests that the number of factors = 7 and the number of components = NA
Based on this, there is clear support for a 1-factor solution. We also consider the 4-factor solution.
fitfact <- factanal(item_scores_B, factors = 1, rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)
##
## Call:
## factanal(x = item_scores_B, factors = 1, rotation = "varimax")
##
## Uniquenesses:
## A1_B1 A3_B2 A4_B3 A5_B4 A6_B5 A7_B6 A8_B7 A9_B8 A10_B9 A11_B10
## 0.95 0.78 0.80 0.73 0.77 0.76 0.78 0.62 0.66 0.70
## A14_B12 A15_B13 A16_B14 A17_B15 A18_B16 A20_B17 A21_B18 A22_B19 A23_B20 A24_B21
## 0.71 0.88 0.83 0.78 0.86 0.66 0.60 0.61 0.62 0.73
## A26_B22 A27_B23 A28_B24 A29_B25 A31_B27 A32_B28 A33_B29 A0_B11 A0_B26 A0_B30
## 0.70 0.65 0.64 0.88 0.76 0.80 0.86 0.89 0.81 0.88
## A0_B31 A0_B32
## 0.84 0.83
##
## Loadings:
## [1] 0.52 0.62 0.58 0.55 0.54 0.58 0.63 0.62 0.62 0.52 0.54 0.59 0.60 0.47
## [16] 0.45 0.48 0.49 0.47 0.35 0.41 0.47 0.38 0.34 0.49 0.45 0.37 0.33 0.44 0.34
## [31] 0.41 0.41
##
## Factor1
## SS loadings 7.63
## Proportion Var 0.24
##
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 3991.49 on 464 degrees of freedom.
## The p-value is 0
load <- tidy(fitfact)
ggplot(load, aes(x = fl1, y = 0)) +
geom_point() +
geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
labs(x = "Factor 1", y = NULL,
title = "Standardised Loadings",
subtitle = "Based upon correlation matrix") +
theme_minimal()
load %>%
select(question = variable, factor_loading = fl1) %>%
left_join(test_versions %>% select(question = label, description), by = "question") %>%
arrange(-factor_loading) %>%
gt() %>%
data_color(
columns = contains("factor"),
colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
)
| question | factor_loading | description |
|---|---|---|
| A21_B18 | 0.6335602 | (ln(sin))' |
| A22_B19 | 0.6229067 | hyp.functions |
| A9_B8 | 0.6173382 | trig.ineq. |
| A23_B20 | 0.6167307 | slope tangent |
| A28_B24 | 0.5989062 | Int = 0 |
| A27_B23 | 0.5934801 | int(exp) |
| A20_B17 | 0.5839634 | (exp)' |
| A10_B9 | 0.5823615 | trig.identity |
| A11_B10 | 0.5489394 | period |
| A26_B22 | 0.5440885 | int(poly) |
| A14_B12 | 0.5367772 | limit |
| A5_B4 | 0.5215381 | logs |
| A24_B21 | 0.5153658 | IVT |
| A31_B27 | 0.4918262 | int(abs) |
| A7_B6 | 0.4910048 | graph translation |
| A6_B5 | 0.4757299 | logs |
| A3_B2 | 0.4702819 | arithmetic rules |
| A8_B7 | 0.4697829 | sine pi/3 |
| A17_B15 | 0.4670353 | graph f' |
| A32_B28 | 0.4470902 | FtoC algebra |
| A4_B3 | 0.4454821 | Easy ineq. |
| A0_B26 | 0.4373808 | FtoC graph |
| A16_B14 | 0.4142581 | diff.quotient |
| A0_B32 | 0.4094848 | scalar product |
| A0_B31 | 0.4058509 | vector product |
| A18_B16 | 0.3802267 | product rule |
| A33_B29 | 0.3700329 | difference vectors |
| A15_B13 | 0.3531196 | series |
| A0_B30 | 0.3422352 | normal vector |
| A29_B25 | 0.3420163 | int even funct |
| A0_B11 | 0.3327412 | rational exponents |
| A1_B1 | 0.2246888 | linear function |
The questions that load most strongly on this factor are very standard calculations in integration, differentiation, and trigonometry – which is consistent with the aim of the test.
TODO - add comment saying it is disappointing that the new questions are all at the low end of this table?
Here we also investigate the 4-factor solution, to see whether these factors are interpretable.
fitfact4 <- factanal(item_scores_B, factors = 4, rotation = "varimax")
print(fitfact4, digits = 2, cutoff = 0.3, sort = TRUE)
##
## Call:
## factanal(x = item_scores_B, factors = 4, rotation = "varimax")
##
## Uniquenesses:
## A1_B1 A3_B2 A4_B3 A5_B4 A6_B5 A7_B6 A8_B7 A9_B8 A10_B9 A11_B10
## 0.90 0.73 0.73 0.71 0.75 0.72 0.76 0.58 0.64 0.69
## A14_B12 A15_B13 A16_B14 A17_B15 A18_B16 A20_B17 A21_B18 A22_B19 A23_B20 A24_B21
## 0.69 0.85 0.75 0.72 0.84 0.54 0.53 0.58 0.62 0.63
## A26_B22 A27_B23 A28_B24 A29_B25 A31_B27 A32_B28 A33_B29 A0_B11 A0_B26 A0_B30
## 0.55 0.61 0.63 0.83 0.72 0.76 0.81 0.88 0.72 0.85
## A0_B31 A0_B32
## 0.79 0.75
##
## Loadings:
## Factor1 Factor2 Factor3 Factor4
## A20_B17 0.56
## A26_B22 0.31 0.57
## A1_B1
## A3_B2 0.35 0.34
## A4_B3 0.31 0.40
## A5_B4 0.41
## A6_B5 0.32
## A7_B6 0.42
## A8_B7 0.37
## A9_B8 0.42 0.36 0.32
## A10_B9 0.46
## A11_B10 0.34 0.36
## A14_B12 0.40
## A15_B13
## A16_B14 0.44
## A17_B15 0.43
## A18_B16
## A21_B18 0.44 0.47
## A22_B19 0.40 0.41
## A23_B20 0.39
## A24_B21 0.47 0.33
## A27_B23 0.37 0.41
## A28_B24 0.32 0.34
## A29_B25 0.36
## A31_B27 0.33
## A32_B28 0.33
## A33_B29 0.38
## A0_B11
## A0_B26 0.39 0.33
## A0_B30
## A0_B31 0.33
## A0_B32 0.41
##
## Factor1 Factor2 Factor3 Factor4
## SS loadings 2.82 2.54 2.20 1.57
## Proportion Var 0.09 0.08 0.07 0.05
## Cumulative Var 0.09 0.17 0.24 0.29
##
## Test of the hypothesis that 4 factors are sufficient.
## The chi square statistic is 1203.85 on 374 degrees of freedom.
## The p-value is 7.2e-88
load4 <- tidy(fitfact4)
ggplot(load4, aes(x = fl1, y = fl2)) +
geom_point() +
geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
labs(x = "Factor 1", y = "Factor 2",
title = "Standardised Loadings",
subtitle = "Based upon correlation matrix") +
theme_minimal()
main_factors <- load4 %>%
# mutate(factorNone = 0.4) %>% # add this to set the main factor to "None" where all loadings are below 0.4
pivot_longer(names_to = "factor",
cols = contains("fl")) %>%
mutate(value_abs = abs(value)) %>%
group_by(variable) %>%
top_n(1, value_abs) %>%
ungroup() %>%
transmute(main_factor = factor, variable)
load4 %>%
select(-uniqueness) %>%
# add the info about which is the main factor
left_join(main_factors, by = "variable") %>%
left_join(test_versions %>% select(variable = label, description), by = "variable") %>%
arrange(main_factor) %>%
select(main_factor, everything()) %>%
# arrange adjectives by descending loading on main factor
rowwise() %>%
mutate(max_loading = max(abs(c_across(starts_with("fl"))))) %>%
group_by(main_factor) %>%
arrange(-max_loading, .by_group = TRUE) %>%
select(-max_loading) %>%
# sort out the presentation
rename("Main Factor" = main_factor, # the _ throws a latex error
"Question" = variable) %>%
mutate_at(
vars(starts_with("fl")),
~ cell_spec(round(., digits = 3), bold = if_else(abs(.) > 0.4, T, F))
) %>%
kable(booktabs = T, escape = F, longtable = T) %>%
kableExtra::collapse_rows(columns = 1, valign = "top") %>%
kableExtra::kable_styling(latex_options = c("repeat_header"))
| Main Factor | Question | fl1 | fl2 | fl3 | fl4 | description |
|---|---|---|---|---|---|---|
| fl1 | A10_B9 | 0.456 | 0.276 | 0.229 | 0.157 | trig.identity |
| fl1 | A16_B14 | 0.44 | 0.03 | 0.131 | 0.206 | diff.quotient |
| fl1 | A9_B8 | 0.416 | 0.361 | 0.128 | 0.324 | trig.ineq. |
| fl1 | A5_B4 | 0.412 | 0.264 | 0.195 | 0.126 | logs |
| fl1 | A14_B12 | 0.403 | 0.225 | 0.277 | 0.13 | limit |
| fl1 | A23_B20 | 0.393 | 0.268 | 0.287 | 0.274 | slope tangent |
| fl1 | A8_B7 | 0.374 | 0.224 | 0.16 | 0.145 | sine pi/3 |
| fl1 | A3_B2 | 0.346 | 0.34 | 0.175 | 0.022 | arithmetic rules |
| fl1 | A32_B28 | 0.326 | 0.082 | 0.208 | 0.288 | FtoC algebra |
| fl1 | A18_B16 | 0.269 | 0.098 | 0.193 | 0.2 | product rule |
| fl2 | A24_B21 | 0.137 | 0.47 | 0.136 | 0.334 | IVT |
| fl2 | A17_B15 | 0.152 | 0.43 | 0.139 | 0.23 | graph f’ |
| fl2 | A7_B6 | 0.213 | 0.422 | 0.2 | 0.13 | graph translation |
| fl2 | A4_B3 | 0.309 | 0.4 | 0.057 | 0.085 | Easy ineq. |
| fl2 | A0_B26 | 0.115 | 0.388 | 0.086 | 0.333 | FtoC graph |
| fl2 | A33_B29 | 0.085 | 0.383 | 0.138 | 0.143 | difference vectors |
| fl2 | A11_B10 | 0.339 | 0.362 | 0.194 | 0.175 | period |
| fl2 | A6_B5 | 0.267 | 0.318 | 0.264 | 0.067 | logs |
| fl2 | A1_B1 | 0.025 | 0.28 | 0.143 | -0.011 | linear function |
| fl2 | A15_B13 | 0.202 | 0.256 | 0.049 | 0.201 | series |
| fl2 | A0_B11 | 0.175 | 0.238 | 0.082 | 0.17 | rational exponents |
| fl3 | A26_B22 | 0.122 | 0.308 | 0.568 | 0.115 | int(poly) |
| fl3 | A20_B17 | 0.276 | 0.261 | 0.556 | 0.063 | (exp)’ |
| fl3 | A21_B18 | 0.438 | 0.185 | 0.473 | 0.145 | (ln(sin))’ |
| fl3 | A22_B19 | 0.402 | 0.197 | 0.414 | 0.22 | hyp.functions |
| fl3 | A27_B23 | 0.368 | 0.162 | 0.407 | 0.251 | int(exp) |
| fl3 | A29_B25 | 0.143 | 0.052 | 0.356 | 0.147 | int even funct |
| fl3 | A0_B30 | 0.093 | 0.135 | 0.26 | 0.232 | normal vector |
| fl4 | A0_B32 | 0.229 | 0.108 | 0.125 | 0.41 | scalar product |
| fl4 | A28_B24 | 0.322 | 0.287 | 0.26 | 0.343 | Int = 0 |
| fl4 | A0_B31 | 0.112 | 0.178 | 0.246 | 0.328 | vector product |
| fl4 | A31_B27 | 0.293 | 0.283 | 0.097 | 0.326 | int(abs) |
TODO add comments from Meike here
The first factor is again calculations, but this time only in calculus (i.e. integrals and derivatives).
The second factor seems to be something like “abstract stuff”, it has to do with limits, rules for logarithms etc.
I guess that could be a category of its own.
The third one is interesting. It’s clearly graphical interpretations. All in different settings, but clearly belonging together.
And of the fourth factor I cannot make sense
This report supports the analysis in the following paper:
[citation needed]
In this analysis we used the following packages. You can learn more about each one by clicking on the links below.